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THE NETWORK PICTURE OF LABOR FLOW* 

EDUARDO LOPEZ*, OMAR GUERRERO*, AND ROBERT L. AXTELL§ 

Abstract. We construct a data-driven model of flows in graphs that captures the essential 
elements of the movement of workers between jobs in the companies (firms) of entire economic systems 
such as countries. The model is based on the observation that certain job transitions between firms 
are often repeated over time, showing persistent behavior, and suggesting the construction of static 
graphs to act as the scaffolding for job mobility. Individuals in the job market (the workforce) are 
modelled by a discrete-time random walk on graphs, where each individual at a node can possess two 
states: employed or unemployed, and the rates of becoming unemployed and of finding a new job 
are node dependent parameters. We calculate the steady state solution of the model and compare 
it to extensive micro-datasets for Mexico and Finland, comprised of hundreds of thousands of firms 
and individuals. We find that our model possesses the correct behavior for the numbers of employed 
and unemployed individuals in these countries down to the level of individual firms. Our framework 
opens the door to a new approach to the analysis of labor mobility at high resolution, with the 
tantalizing potential for the development of full forecasting methods in the future. 
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1. Introduction. High employment is one of the central goals of any economic 
policy, because this is associated with economic, social and political prosperity of 
countries. Of the many perspectives that need to be considered to understand the 
problem of employment, job search has attracted a large amount of interest for its 
relatively well defined nature, and the perception that economic policies can have an 
important impact in its optimization; numerous important results have been obtained 
and are well summarized in reviews such as [1, 2]. The main approach to understand 
job search is known as search and matching modeling [3, 4]. Search and matching 
models broadly consist of a stochastic process by which two kinds of entities (e.g., 
unemployed individuals and vacancies) join to create a new match, and this joining 
is mediated by a success rate called the aggregate matching function [1, 5]. These 
models have been successful at predicting quantitative and qualitative features of 
employment, and are well accepted [1], 

However, despite their success, search and matching models have inherent lim¬ 
itations in the way they are constructed. One of those limitations, the notion of 
aggregation, eliminates from consideration the role played in the dynamics of em¬ 
ployment by specific companies (firms 1 ) in the economy. At first glance, this may not 
seem critical because any country has a large number of firms, many of which are quite 
similar. But upon more detailed consideration, one finds it is also true that in most 
countries there are firms that play central roles, and when these particular firms are 
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affected by any number of factors such as technological change, new economic policies, 
or competition, the impact on employment can be considerable and have downstream 
effects on the entire economy of the country. Indeed, empirical evidence has shown 
that the shocks experienced by the largest firms are responsible for the majority of 
fluctuations in the total production of an economy, and not necessarily because of 
firms’ sizes, but because of the propagation effects through the entire system [6, 7]. 

For some decades, governments from several countries have stored highly granular 
micro-data about firms and workers, constructed from social security records [8]. 
However, detailed analysis of coupled firm and labor dynamics is not common in the 
economics literature, due to the limitations of commonly employed methods. Such 
data, in conjunction with the framework here proposed, offers the opportunity to 
uncover the specific roles that firms play in employment. 

By approaching the problem of job mobility in a novel way, and using data already 
available, it is possible to construct more detailed models of job transitions with reso¬ 
lution at the level of individuals and firms in entire countries, offering a new approach 
to the study of labour dynamics. This is the focus of the current article. Specifically, 
we introduce a stochastic process on graphs that accurately represents observed em¬ 
ployment and unemployment patterns in two comprehensive micro-datasets. Labor 
mobility occurs within graphs (or networks), which summarize the constraints that 
agents encounter while moving between jobs. These graphs are constructed as follows: 
vertices (nodes) represent firms, and edges represent previously observed job transi¬ 
tions between the firms (one or more workers changed jobs from one of the nodes to 
the other in a chosen time period). Workers are modeled as performing a set of simple 
decisions: when employed, they separate from their job with a firm-dependent proba¬ 
bility, and when unemployed they choose to apply to one of the neighboring firms on 
the graph that is open to hire. These rules amount to a version of random walks on 
graphs (for reviews on this topic, see e.g. [9, 10, 11]). Although the model is simple, 
it is able to reconstruct relevant detailed employment features of the micro-data. 

One of the advantages of such a model is that it can be directly calibrated from 
real data down to the level of firms. This calibration, together with plausible scenar¬ 
ios based on the introduction of economic policies, market, or technological changes 
affecting particular firms, or changes in labor laws, to name a few, can potentially 
lead to more accurate and highly resolved forecasting of job mobility trends. Such a 
forecasting tool would be very valuable for those responsible for economic and labor 
policy-making. 

As a technical consequence of our approach, we find it useful to introduce an in¬ 
novative concept: firm specific unemployment. In the model, this concept is necessary 
because individuals that have recently stopped working at a particular firm engage in 
job searching only along the edges adjacent to their most recent employer, which we 
term local search. Individuals then remain associated with a firm from the moment 
their employment finishes, to the date when they find a new job in a different firm. 
This concept leads to a set of new considerations about the way in which we interpret 
unemployment. 

Finally, we present statistical evidence derived from the micro-datasets that sup¬ 
ports our approach. First, we corroborate that our set of assumptions for the model 
are consistent with reality. This corroboration includes verifying that the structure 
of the graphs used are persistent , i.e., that job transitions over time do not simply 
occur randomly, but instead are regularly repeated over time, lending strength of our 
use of static graphs to model job movements. To our knowledge, this is the first time 
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such a test is performed in large scale disaggregated data. With our assumptions, we 
show that a restricted version of our model is consistent with the general statistical 
features of the data including the typical number of employees at each firm, and the 
number of people looking for jobs after being separated from their previous firm. 

Graph approaches to the problem of job search have been considered before, albeit 
not with our focus. Notably, in Refs. [12, 13, 14, 15, 16] and related work, job search 
is analyzed as a social network, where information about vacancies travels along social 
ties. This approach, related to the ideas and thinking of other social scientists such 
as Granovetter [17] have been shown to be consistent with empirical observations. 
However, the disadvantage of the social network framework is that social ties are not 
usually susceptible to the tools of economic policy, and are also hard to characterize 
empirically. Our approach is fundamentally different in that it focuses on the very 
entities in which employment takes place: firms. Somewhat related work was carried 
out in Ref. [18], where the authors consider a purely theoretical model of worker 
transitions between firms as a Markov process, but their approach mixes aggregate 
and disaggregate features, and does not tackle any empirical verification. Two of the 
authors of the current manuscript proposed the framework of labour flow networks 
in a previous publication [19], but focused on studying their empirical properties and 
modeling them as the result of economic interactions. In contrast, we use the networks 
as a static and persistent structure that shapes labour mobility. In this publication, 
we attempt to build the basic modeling framework that can lead to predictions of job 
mobility at high resolution (down to the firm level) and show that this approach is 
consistent with the collected empirical evidence. 

The article is structured as follows: Sec. 2 is dedicated to the construction and 
calculations of our labor flow model, including the derivation of the equations that 
broadly govern the problem, the main model predictions, and a sketch of the algorithm 
necessary to apply the model to data; Sec. 3 is concerned with the empirical analysis 
of the data to both justify our choices in building the model, and to compare data with 
the predictions of the model, and; Sec. 4 presents the final discussion and conclusions 
of our work. 

2. Modelling worker movement. In order to provide a clear framework, we 
begin our detailed discussion by first introducing the assumptions of the model. This 
is followed by the calculation of the generating functions and moments of the dis¬ 
tributions of numbers of employees and unemployed agents associated with a firm. 
We then present a treatment of the evolution of an individual agent, including job 
and unemployment times, which offers an alternative way to calculate the properties 
of the model. We finalize the section by explaining how the model can be written 
directly from measurable quantities in available data, and how such data needs to be 
used in order to predict labor flows. 

2.1. Modelling rules and assumptions. Job search is a complicated problem, 
influenced by a number of factors such as skills of an agent, the type of business 
undertaken by a firm, effectiveness in advertising and recruiting workers by firms, etc. 
In order to model the search process in a tractable but efficient way, we must build a 
framework that is at the same time rich enough to avoid losing critical behavior, but 
simple enough that it can shed light on the qualitative features of the problem 2 . 


2 Some of parameters we choose to model here, namely firm rates of acceptance of applicants, and 
of being open to receive applicants, are both intuitively motivated and respond to the needs of more 
detailed economic modeling; a complementary treatment of our model that focuses on the economic 
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Broadly speaking, there are two basic elements that need to be modelled: the 
structure of the economy, and the behavior of the agents. 

To represent an economy, we construct a graph G that encodes N firms as nodes, 
and edges that represent allowed job transitions agents can undertake (for a review of 
graph theory, see [21]). The graph is assumed to be both undirected and unweighted. 
When dealing with real data, we develop a procedure to construct such graphs (see 
Sec. 3.2), but for the purpose of modelling the agent’s behavior, the graph is taken 
as input to the model. For a theoretical investigation, one can, for instance, study 
job mobility in a graph sampled from an ensemble of random graphs with features 
relevant to a research question of interest; for a fully empirical study, one can use a 
single graph constructed from data for a specific economic system such as a country. 
Either way, we consider the graph to be static, which is to say, not changing in time. 
This assumption is in fact driven by our empirical findings (see Sec. 3). 

Firms are also characterized by a number of parameters that govern the agent 
dynamics, and these are also considered constant in time. One of these parameters is 
the probability A i that an agent at firm i becomes unemployed at any given time step. 
While the agent is employed at i it is said to be in state £j, and if it is unemployed 
with i being its last employer, it is in state Ui. Probability \ corresponds to the rate 
by which agents at i move from state Li to state Ui. This probability can vary from 
firm to firm, but any agent employed in a specific firm has the same probability to 
become unemployed. An equal probability to become unemployed at a given firm is 
equivalent to having equal average employment time (tenure) for all employees of that 
firm (see Sec. 2.5). 

Another parameter that is associated to a firm i is the probability per time 
step that it will be accepting applications. This parameter is also assumed to be firm 
dependent, and it may be interpreted as a combination of a firm’s financial strength, 
need for personnel, aggressiveness of recruitment, etc. 

The last parameter that we must define for a firm i is its rate hi of hiring ap¬ 
plicants, i.e., the probability that any individual that applies for a job at i becomes 
employed. Parameters hi and i\ play an important role in regulating the size of a 
firm. In real economies, even though detailed and systematic data is not available 
to determine hi and Vi, they are sensible parameters that one expects to find in as¬ 
sociated with firms. We assume their values are in the interval (0,1] in order to be 
meaningful in the model. 

The behavior of agents is governed by the following rules. First, an agent employed 
at a firm (say i) at time step t tests whether it is to remain employed (in state Li) or 
not (move to state Ui) with probability If it remains in Li , it continues onto the 
next time step t + 1. If it moves to Ui 1 it waits one time step and then looks for a job 
at step t + 1. To search for a job, an agent in state U, identifies all node neighbors j 
that belong to Ti (the node neighbors of i in G) and that are accepting applications 
at that time step, each with probability Vj. The agent then applies to one of those 
neighbors with uniform probability. If none of the neighbors are open, the agent does 
not submit any applications and remains in U, for an additional time step, when it 
again tries to find a job. The agent constraint of looking for a job only inside the 
graph leads us to define a firm specific unemployment , which reflects the continued 
“association” that agents have to their most recent employer. We assume all agents 
are fully aware of all neighbors that are currently accepting applications. 


perspective is contained in an upcoming publication [20] by the authors. 



THE NETWORK PICTURE OF LABOR FLOW 


5 


With the model defined as above, we calculate analytical solutions for the av¬ 
erage numbers of employed and unemployed agents at the firms of the graph, the 
probability of any agent to be employed or unemployed at a given firm, and provide 
the recipe for calculating other quantities of the model. Since most economies spend 
large proportions of time in states of small overall change, we focus on the steady 
state behavior of the model. This serves as a reasonable starting point for comparing 
the predictions of our approach to data from the real world. 

Our model, at its core, corresponds to a random walk process on graphs in which 
some of the time scales have been modified by the waiting times that occur both in 
the employed and unemployed states. Formally, the process is a Markov chain, as the 
state of the system depends only on the previous time step. 

2.2. Evolution of the state of a firm. To begin our detailed study, consider a 
given connected undirected graph G with N nodes, and H agents distributed among 
the nodes of the graph (the workforce). We focus on the evolution of the system, 
captured by the probability distribution Qi,t(Ui t t, Li t t) of there being U t j, unemployed 
and Li^t employed agents at i at time t, where U-n, Lij are random variables that can 
take on values from 0 to H. To learn about the steady state of the system, we must 
first write down the explicit evolution equation, and consider its behavior in the steady 
state where Qqt = Q\ ‘ for all t. i.e., the distribution becomes stationary in time. 

To specify the evolution equation of the system, we break down each individual 
mechanism of the flow process for node i between time steps t and t + 1 , where the 
number of employed and unemployed agents at i and t are Lqt and respectively. 
Consider first A u , the random variable that represents the number of agents becoming 
unemployed at a given time step. Because each agent acts independently, A„ has a 
binomial distribution, i.e., 



( 2 . 1 ) 


Another mechanism affecting the number of employed agents is the acceptance or 
hiring rate hi of a firm. The number of new employees depends on both the number 
of agents that apply for a job at firm i, and those that are accepted. Given a number 
of applicants A i t , the probability to accept A; of them is also given by a binomial 



( 2 . 2 ) 


The processes related to (2.1) and (2.2) are responsible for the number of agents that 
are employed at i at time t + 1, namely Tqt+i = Tqt — A„ -f A;, with probability 
given by the product of the two binomials above 3 . 

From the standpoint of the number of unemployed agents at t+ 1, [/*,*+1 depends 
upon Ui t t, A u , and the agents in state Ui that find employment elsewhere, which we 
specify in detail below. For that purpose, we define 7 j jt , the subset of Fj of neighbors 
of i that are accepting job applications at time step t. The probability to draw any 


3 The distribution of A., j in the steady state is the same as that for the out flow of agents from a 
firm, and thus it is given by (2.14) below. Here we do not make use of this result in developing the 
article, and thus obviate it. 
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given subset 7 i t is given by the joint distribution 

(2.3) Pr(7 i,t) = v j (1 — v rn ) 

ijt 

where the set j i t is the complement set of 7 ^ with respect to T,, i.e., 7 U ^j it = 
and 7 i } t fl 7 it = 0. The use of t when referring to any 7 j )t is not strictly necessary, as 
the conhgurations of open neighbors are sampled independently each time step, and 
thus we drop reference to t for these sets. When at least one neighbor is accepting 
applications, the probability for any agent to apply to a specific open neighbor of 
i is equal to 1 /17* | - Therefore, job applications are distributed among 7 ,; according 
to a multinomial distribution. Given unemployed agents, with Vj 3 applying to 
neighbor j £ 71 , and using the symbol 17 to represent the entire application allocation 
to all nodes in 7 $, the distribution of applications to the neighbors is given by 

(2- 4 ) F, MK ,,, 7( ) = (^(^)""* 

where we have used a shorthand notation for the multinomial coefficient given by 

U iit \ 

Viji 1 Vijn • • • , Vjj 1^1/ 

with j 1,..., j| 7i | the elements of 7j. Given an acceptance rate of hj for neighbor j, 
rjij agents are hired at j out of the Vj 3 that apply, and this random variable is also 
distributed in binomial fashion, 



(2.6) Pr (Vij = xWij) = ~ h o) Vii X - 

Altogether, representing the total accepted applications by r] i := ( r]ij 1 ,..., Vij^ i ) > the 
probability for those acceptances is 

(2.7) Prfa>i,7i) = II ~ M"*' "‘ J - 

If in a given time step all neighbors are closed to new applicants then, by construction, 
i'ij = 0 for all j, and similarly for rjij. Symbolically, 7* = 0 and 7 i = Tj, and this 
occurs with probability rifer* (1 — Vj). In this case, Pr(i 7 |[/j,j, 7 ^ = 0) is equal to 
5[vi,0] by use of the Kronecker delta, with the convention that 17 = 0 means that 
all Vi 3 = 0. Analogously, Pr(r/ i |z/ i , 7 j) is 5^,0] when all neighbors are closed. Let 
|rjj| represent the total number of agents accepted into other positions, and given by 
toil = E i£Ti r]ij. Then the number of agents in state U in time t + 1 is given by 
Ui )t +1 = U i)t + A u - |ryj. In particular, when 7 , = 0, |rjJ = 0. 


To summarize the evolution, we must collect all the previous mechanisms, sum¬ 
ming over all possible 77 Ai t t, A„, Aj, 17 , r^, an d in addition, since there are multiple 
states at time t compatible with a given state at time t + 1 , one must also sum 
over Ui y t,Lij. Writing a single summation symbol for the previous variables, the 
full expression for the evolution of Qi,t is given by (omitting the conditionals on the 
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distributions) 

( 2 . 8 ) Qi,t+i(Ui t t+i, Li t t+i) 

= Qi,t(Ui >t , Li t t)5[Lij+ 1 , — A u + , U^t + A„ — I^J] 

xPr( 7 i )P r (A i )Pr(A i>t )Pr(A„){( 5 [ 7 i , 0 ]( 5 [iy i ,O] 5 [|? 7 i |,O] + (1 - <5^, 0 ])Pr(iy i )Pr(|r 7 i |)} 

where the <5[ 7 *, 0 ] = 1 only when 7 * = 0 and 0 otherwise, and similarly (5[ 17 , 0 ] = 1 
only when all Vij = 0 and 0 otherwise. The use of \rji\ in both terms of the brackets 
is a shorthand for the fact that in order to have a net outflow of agents equal to |ryj, 
one must take all possible combinations of {' 7 y }je 7 , ; for given 7 and take those for 
which the overall flow is IryJ; in other words, we are implicitly using an additional 
factor S[\‘n i \,'Eje-y i Vij]- 

It is convenient to employ the generating function formalism [22] for calculating 
moments of the distribution. By definition, the generating function of Qi,t(Ui y t, Li, t ) 
is 

(2-9) Qi,t(x,y) = Y. L i ' t )x Ui ’ t y Li ’ t , 

x,y 

and similarly for Qi,t+ 1 . Using this definition on ( 2 . 8 ) applied to time t + 1 , one 
obtains the relation 


(2.10) Qi, t +i(x, y) = 4>{1 - hi + Ky ) {Qi,t[x, xX , + y( 1 - A^JPr^ = 0) 

+ 2i,t + z(l - (/i} 7< ), x\i + y(l - A*)] Pr( 7 ;)} 

7i#0 


where (j> is the generating function associated with the distribution Pr(Al^t), {h) li := 
hj/\ji\, and the notation means that the sum runs over all possible 

configurations { 7 ^} of open neighbors of i except for the case when all neighbors are 
not accepting applications. 

(s) 

The previous results can be specialized to the steady state, where Q r .t —> Q\ is 
independent of time. For now, we assume that this steady state exists and determine 
some of the statistical properties of the process such as the number of employed and 
unemployed agents; the existance of a steady state solution is shown later (Secs. 2.3 
and 2.4). 

The generating function (2.10) can be used to calculate moments of Q[ s \Ui, Li), 
although the algebra can be cumbersome for higher moments. For the average unem¬ 
ployment associated with firm i, we have 


( 2 . 11 ) 


m = 


dQ { i\x,y) 


dx 


x=y =1 


and for the average employment, 


( 2 . 12 ) 


(Li) = 


dQ { '\x,y) 


dy 


x=y=l 


By substituting the steady state distribution on both sides of (2.10), and using 
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the chain rule when taking derivatives of x and y 4 , we obtain from (2.11) and (2.12) 


(2.13) 


m 


A* (Li) 


where we drop t since we are in the steady state, and the sum is over all possible 
7 i except 7j = 0. Note that this expression indicates how average employment and 
unemployment relate to each other, but does not provide a solution that is solely 
based on the basic parameters of the problem. To construct a full solution, we must 
analyze in detail the flows of agents in the system, which we proceed to tackle next. 


2.3. Average employment and unemployment of a firm. In order to make 
progress, we study the full distribution Pr(|77,|) of outgoing agents from firm i. Let 
us recall that the distribution of outgoing application allocations is governed by 
Pr(iZj|[/i j t, 7i) and the hirings by Pr(r/ ! -JiZj,7j). Furthermore, the overall flow is also 
dependent on 7* and f7; it (through Qi,t(f7j i t, Lj it )). We must also keep in mind that 7* 
can be the empty set when no neighbors are receiving applicants. Therefore, summing 
over Uij,L itt ,Vi and {7,} (the set of all possible configurations of open and closed 
neighbors to i), we have 

(2.14) Prfl^l) =^Qi,t(U i> t,L itt )Pr(i i ){5[ji,<t!i\6[is i ,0)5[\r) i \,0} 

+ (1 - %u0])Pr(i'^u7O Pr (^il I 'u7i)}> 

where we have kept the conditionals to avoid confusion. The corresponding generating 
function for Pr(|?7 i |) is given by 

(2.15) i/j(x) = Pr(7i = 0) + EE Qi,t [1 - (h) 7i +x(/i) 7i ,L iit ]Pr(7j, t ). 

7 Li,t 


Since tp(x) = ^ Pr(|?7 i |):zil 7 m, the sum over Li t remains expressed since there is no 
additional variable y that sums over the second argument of Qi,t(Ui,t,Li^). Despite 
this, we still use to represent the generating function summed only over C/j.j. In 
the steady state, the average outflow is given by the first derivative dip/dx evaluated 
at x = 1, which produces 

(2-16) <kl) = (f/i)EW7, Pr (7i) 

7i#0 


and with the use of (2.13), 

(2-17) (177,1) = A i(Li) 

which is intuitively sound, as the number of agents that become unemployed and look 
for jobs is on average A i(Lj) and therefore they must flow elsewhere for the steady 
state to be achieved. A similar calculation leads to the average steady state agent 


4 Note, for instance, that taking x derivative of tW' 1 (a;, xX , + y( 1 — A,;)) leads to ( a , f))/da + 

^idQ.) a \a, 0)/d/3 which is equal to (Ui) + Xi(Li) when evaluated at x = y = I (which leads to 
a = P = 1). 
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flow along a particular edge, which is 

( 2 - 18 ) (Vij) = ( Ui)hj Y -^y: Pr (7| 3) ) 

{7 CJ)} Hi I 

where {'y^} is the set of all possible configurations of open and closed neighbors of 
i in which node j is guaranteed to be present (open), and the sum is over all such 
configurations. 


The steady state condition is satisfied if the average flows into and out of a node 
(firm) are equal. This implies 


M = <A *} = E<^>- 

je r» 


(2.19) 

Using (2.13), (2.17), and (2.18), one can restate this as 

( 2 . 20 ) 


A i(Li) = Y 

i£U 


A jhi(Lj) E {7 w } Pr (7] l) )/l7j 0 | 


Y^(h) 7j Pv( 7j ) 


This expression provides a system of equations that can in principle be solved for all 
(Li), provided such solution exists. 


To understand this further, we write (2.20) in matrix form making use of the 
adjacency matrix of the graph, A, for which Ay = A r ,; = 1 if i and j have an edge 
connecting them, and zero otherwise. This produces the expression 


( 2 . 21 ) 


N 


E 


hi £ {7 « } pr (7f)/l7f 


S[i,j] 


X J (L J )=0 


for all i. This represents a homogeneous system of linear equations, which always 
has the trivial null solution, and has non-trivial solutions if and only if the matrix 
contained inside brackets is singular which, among other things, implies that the 
matrix does not have full rank [23]. To show that our model has non-trivial solutions 
indeed, we define the matrix A, with element A,; ? corresponding to the expression 
inside brackets 


( 2 . 22 ) 


A — A ■ 

1 *-ij •— 


^E {7 co } p r(7f)/|7fl 

£ 7 ^ 0 W 7 , p r(7,-) 




This matrix does not possess full rank as can be explicitly seen from the fact that all 
columns add to zero. To show this, we first sum A^ over i 


(2.23) 


N 


E 


N 

-1 + E A b 

i=1 


hi E {7 w } Pr (7i i) )/l7j i) | 


where —1 comes from — 5\i,j]. We can now show that the numerator and denom¬ 

inator of the second term are indeed equal. To see this in detail, we organize the 
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elements of by cardinality |7^|, and rewrite the numerator as 

n |r*| 

(2.24) V A,,//, ^ l’f(-'°)/ ' I = 55 “ 55 A 'A 55 Pr (%- l) )> 

8=1 {7fl C=1 i l7fl=c 

where the last sum is over all elements of {t]*'*} with equal size c. Now, the sum over 

(i) 

i guarantees that each neighbor of j belonging to a particular 7) ' is summed, along 

with the corresponding h r , where r £ 7^. Therefore, the sum over i can be rewritten 
as 


(2.25) 


55 A-ijhi 

i 


E Pt Lf) = E 

hf’hc l7il=c 



Pr(7i) 


and inserting this into the sum over c leads to 


( 2 - 26 ) 55^ 55 (55 Pr (^-) = 51 E i!, 7 | r Pr (7j) = 55( /i )7i Pr (T 3 ) 

c=i | 7 ^=c \r67i / 7j#0 ^ 7j/0 

Therefore, 

N 

(2.27) 55 55 Pr (0'j l) )/l7j* ) I = 55 

i=1 {7j i) } 7 ^ 0 


which means that for all j , (2.23) is identically zero. 

The fact that A has reduced rank can also be seen from (2.16) and (2.18), which 
imply 

(2-28) A . j = Aij ^L-6[i,j], 


and because, by definition \rjj\ = one arrives at = 0 as before. 

But matrix A, as expressed in (2.28), manifestly represents the Laplacian matrix 
of a random walk with heterogeneous transitions probabilities on the edges of the 
graph, a well-understood process [21]. Such walks are known to be ergodic, and their 
convergence rate can be calculated through the spectral properties of G. 

To develop the rest of the theory, we focus on graphs with a single connected 
component containing all nodes N (a connected graph), and explain the more general 
case below (see Sec. 2.6). Defining the column matrix 

(2.29) Xj = Xj(Lj) 

for the average employment in the firms of the system, one obtains the homogeneous 
system of equations 


(2.30) 


AX = 0 
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where the right hand side is the column matrix of dimension N x 1 of zeros. The 
non-trivial solutions to this system, if they exist, depend on A being singular, which 
is valid in our case. Since the matrix for a connected graph has rank N — 1, its kernel 
is one-dimensional, and thus, to choose a unique solution that belongs to the kernel 
of A one needs a single additional condition. In our case, this condition corresponds 
to the total number of agents H in the system, i.e. 

N 

(2.31) £«£i> + m) = H. 

i—1 

Application of (2.31), as illustrated below, leads to the desired unique solution. 


Solving (2.30) and (2.31) in the general case does not produce compact solutions. 
However, it is possible to obtain some explicit solutions for simple cases, such as when 
the probability that a firm is open to hire is homogeneous over all nodes (Vj = v for 
all j). Explicitly, note that in the homogeneous case Pr(yj) —>• ^ITj I (1 — ^0 l r J I—Itj I. It 
is common in the networks and graph theory literature to use the notation kj = | T 7 1, 
and refer to kj as the degree of node j. Then, 

(2.32) 


E Pr (E } )/hfi 



kj- 1 \ w l'rj i) l(i_ w )*j-l7 J <0 l 

frf 1-1/ bE 


1 - (1 - vp 

kj 


For the sum X) 7j /0(^)7i Pr (7i)i we n °te that each acceptance rate h t for i € re¬ 
appears (j^E) times among all the terms where there are \jj | open neighbors to j. 
One can then write in the homogeneous case 
(2.33) 




(/i) 7 p r(7e- 


kj 

E 

l'Yil =1 


kj — 


ier,- 


\lj\ — 1 / 111 


-V l7<l(l-w) fc i —l7il = (h)r 


where (fi)r J := X). igr - hi/kj, i.e., the average hiring rate of the full neighbor set of j. 
In this case, the matrix A takes on the form 


(2.34) 


.(v) _ A.jj hi 

13 kj(h) rj 




and in the very simple example where all hi are equal, A is equal to the usual nor¬ 
malized Laplacian for random walks on an unweighted graph. To refer to this model, 
we introduce the superscript (v) as a reminder that this quantity is now constant. By 
inspection, we can find a solution for X, which provides 


(2.35) 

and 


{U) {v) 


phi(h) ri ki 

A 


(Ui)^ = 


phjkj 

1 — (1 — v) ki ’ 


(2.36) 
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where p is a constant that can be obtained by imposing (2.31), and is given by 

H 

J_ i_t_ 

A. + (h)r t [1—(1—_ 

This quantity has an intuitive interpretation, in that it captures the average flow rate 
of workers all through the system. 




2.4. The agent perspective. The results from the previous section were de¬ 
rived with the population of agents in mind. In that context we showed that there are 
non-trivial solutions for (Li) in the model, and derived the equation that describes 
the system. 

An alternative approach to the solution of the model is to consider the single agent 
perspective. This approach is a valid alternative to solve the model because agents 
are non-interacting, and therefore the dynamics of any one of them are sufficient to 
rederive the results above. In this section, we elaborate on this approach still within 
the context of connected graphs. 

Taking the view of an individual agent, it is convenient to define the probabilities 
r(i,t) and s(i,t ) that the agent would be, respectively, employed or unemployed at 
the node i at time t. These two probabilities, explained in detail below, satisfy the 
equations 


(2.38) r(i, t) = (1 - Xi)r(i, t - 1) + hi ^ s(j, t - 1) ^ —L-Pr^) 


jeVi 


{ J } 17, 


(2.39) s(i, t ) = A ir(i, t — 1) + s(i, t — 1) 


5Z Pr ( 7i )]EI + Pr ( 0 ) 

7«/0 1 


where the square brackets of the second equation can be simplified to 1—E 7i7 ^0 (/i) 7i p r(7 i ). 
The first equation states that the probability for an agent to be at node i at time t is 
given by the probability to be at node i at time t — 1 and not become unemployed, 
plus the probability that the agent is unemployed at one of the neighbors of i. that i 
is accepting applications, that the agent choses to apply to i, and that the application 
by the agent leads to being hired. The second equation states that the probability to 
be unemployed at i at time t is given by the probability to be employed at i at time 
t — 1 and be separated with probability Aj, or to have been unemployed at time t — 1 
at i but not find a job among the neighbors of i, either because they are all closed, 
or because the agent chooses to apply to one of the neighbors and is not hired. 

The previous results lead to a set of difference equations that can be written as 
a matrix equation with block structure. In the steady state, this matrix equation is 
simplified because the conditions r(i, t ) — r(i, t — 1) =0 and s(i, t) — s(i, t — 1) = 0 are 
satisfied. Given that in the steady state r and s no longer depend on time, we write 
the equations for r(i,t) —>• r^i) and s(*,f) —>• Soo(*) in the steady state 

(2.40) 0 = -A,r«(i) + hi ^ ^ —b; Pr (7j ,) ) 

0 — Aj7’ 00 (^) 5oo(^) ^ 

7i#0 


(2.41) 
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which can be solved by first expressing Soo(*) in terms of r x (i) 


(2.42) 


■^oo(^) — roo(i) 




and substituting into (2.40) to produce 


(2.43) 


N 

E 

j=i 


Au^E {7 « } Pr(7f)/l7fl 

E 7J #0W7j Pr (T j) 




A j'f’ooij ) — 0. 


The matrix in brackets is simply A defined in (2.22). As we have seen, the matrix 
does not have complete rank, guaranteeing the existence of non-trivial solutions. The 
steady state with homogeneous probability Vi = v for firms to be open leads to 
solutions similar as those above for the entire population of agents, but with a different 
p which we relabel as y, i.e., 


(2.44) 

' OO 

(2.45) 

,(«) 

°oo 

(2.46) 



/ ■ ■. xhi(h) Ti ki 
w = ^— 
xhiki 

W = 1 - (1 - „)h 
X = p(H = 1 ) = 


E»eG^*W [l-(l- t .) fr i] 


where the normalization condition is Ei[ r (*) + s(*)] = 1 (independent of the steady 
state or the condition Vi = v). 

Once roo(i ) and Soo(*) have been determined for the model of interest (homoge¬ 
neous or heterogeneous h,v, etc.), the number of employed and unemployed agents 
at firm i can then be computed via 

(2.47) Pr(L i )= (^)[roo«] L 11-r-oo (i)] H ~ L > 

and 

(2.48) = (^y Soo (i)}^[ l-aocWr- 17 *. 


These expressions reproduce the results presented in the previous sections, and can 
also be used to calculate higher moments of the distributions on the basis of the steady 
state distributions for a single agent. For instance, the variance for Li and [/,; can be 
calculated via well-known expressions for binomial distributions, yielding 

(2.49) var(Lj) = {L\) - {Li) 2 = Hroo(i)[l - r^i)] 
and 

(2.50) var(£70 = (U?) - ( U *) 2 = H Soo (i)[ 1 - Soo (*)]. 


From the practical standpoint, it is useful to realize that, if Too(*)) s oo(*) need to 
be estimated (say numerically), (2.47), (2.48), (2.49), (2.50), and other quantities 
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that can be calculated as functions of (i), s oc (i) become particularly useful because 
it is no longer necessary to try to solve (2.30) and (2.31) directly, which could be 
demanding for very large economies. Instead, estimates of Soo(i) could be 

utilized to arrive at meaningful results. 

2.5. Employment tenure and unemployment spells. In our model, the 
mechanism for job separation is characterized by a geometric distribution. Hence, an 
agent employed in firm i has a probability A* to be separated per time step. Therefore, 
the distribution Pr(t^) of employment duration (also known as job tenure), is 
given by 

(2.51) Pr(tW) = (l-A i ) t<,) - 1 A < . 

The average time of employment in firm i is given by 

(2.52) <ij 0 > = y- 

A similar calculation provides us with the duration of unemployment spells. 
In particular, the probability for an agent to find a job among the neighbors of firm i 
(i.e., the effective rate of hiring) depends on := (/i^Pr^,), and therefore, the 

distribution of unemployment spells is given by 

(2.53) Pr(f(“)) = (l-&) tW_1 & 


with average unemployment duration 

(2.54) (t ( ? ) ) = Y 

In the case of homogeneous probability for firms to accept applications (vi = v for all 
i), unemployment spells are charaterized by = (/i)rjl — (1 — v) ki }. This allows 
us to rewrite (2.36) and (2.45) in the more intuitive forms 


(2.55) 


(Ui)^ = 


phjkj 

1 — (1 — v) ki 


phi{h) Ti ki 


and 

(2.56) 


„(«) 


(0 


X h iki 

1 — (1 — v) ki 


xhi{h) Ti ki 


which also exposes the symmetrical nature of the values for average employment and 
unemployment as we see in detail in (2.57) below. 

The characteristic times calculated above help us provide an intuitive understand¬ 
ing of (2.37) and (2.46). Specifically, note that the joint distribution of being employed 
for time steps and subsequently unemployed for fW time steps is distributed as 
the convolution of the two geometric distributions above, and the average time for 
this joint distribution is 1 /Ai + 1 /&. From this, one realizes that the terms in brack¬ 
ets in the denominators of (2.37) and (2.46) correspond to the average durations of 
employment plus unemployment of agents at firm i. The factor hi(h)riki corresponds 
to the probability to enter and exit each edge connected to i. Therefore, p measures 
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the amount of overall job mobility in the entire economy, and x corresponds to the 
per-agent p. 

Expressions (2.35) and (2.55) are very similar, the only difference being the ex¬ 
change of A i and i\ h> ■ By taking the quotient of (2.35) and (2.55) we obtain 


(2.57) 


(Li)W g v) 

(UJM A, ' 


In Sec. 3.4, we compare this result with empirical data (see Eq. (3.7)). This relation is 
potentially useful because (2.57) only involves model quantities that can be measured 
in the empirical data. Note that the ratio in (2.57) measures how much time an agent 
spends employed at firm i compared to the time the agent spends looking for a job 
among the neighbors of i. 


2.6. Application of the model. An attempt to apply our model to real-world 
situations runs into the difficulty that data is not available for all parameters. In 
particular, we do not have data to determine the rates of opening of positions {«i} nor 
hiring rates {hi} of firms. These parameters, relevant from the economic standpoint 
as they allow calculation of endogenous effects for each firm, are not usually collected 
by statistical authorities. These difficulties, however, can be overcome by expressing 
the equations of the system in terms of available information. 

As observed above in (2.28), A can be written in terms of outflows which can be 
directly measured from data, and then inserted into (2.30). We also require a method 
to express the uniqueness condition (2.31) in terms of the data. Note that (2.31), 
with the use of (2.13) and the definition of can be rewritten as 
(2.58) 

N N 

h = j2((Li)+m) = 

i— 1 i—1 


A E 7 ^ 0 W7, p r(7,-) 


A r 


A i{Li) — ^2 + ^7 ) A(T'i) 


where both Aj and £* can be measured via average employment and unemployment 
times of workers at firm i. 

Using (2.58), one can construct a modified matrix A where one of the rows from 
A is eliminated (any row) and substituted by a row based on (2.31). This leads 
to a non-homogeneous linear set of equations with a unique solution. One possible 
concrete form of this can be to eliminate the last row to generate A,;,- with the form 


(2.59) 


( Aij [1 <i<N- 1] 

U + i [< = *]■ 


With A defined in this way, one can further introduce 


(2.60) 


JO [l<j<N-l] 

\ H [j = N) 


leading to the matrix equation 


(2.61) 


AX = Y, 


where X is still the column matrix of employment sizes as in (2.30). 

As a final ingredient, we relax the condition that G is a connected graph, and 
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allow for the presence of C disconnected components. We continue to assume that the 
structure of the components is generally non-trivial in real data, which is to say that 
in what follows we do not expand on the cases where G has isolated nodes or very 
small components (containing, say, only two nodes) where the behavior is trivially 
simple but requires more technicalities to be described with rigor. 

It is known that for a graph with C connected components, the rank of the 
adjacency matrix is N — C. This reflects the fact that the dynamics of walkers 
on each component runs independently of the other components due to the lack of 
connections among them. This reduced rank value corresponds to the need for C 
distinct conditions stemming from the number of workers on each isolated component. 
For a given initial distribution of such workers {W c } c =i,...,c where , H c = H over 
the components, one obtains a set of C conditions 


(2.62) 


SieGi (C^*) + (Hi)) — Hi 

£ JeGc «^> + <^>) = H c 


where G c is the c component among C. The modified matrix A can now be constructed 
in a similar way as for the single component case. As a prior step to the construction, 
we relabel the nodes so that the adjacency matrix becomes block diagonal, with each 
block corresponding to the adjacency matrix of a single connected component. In this 
way, we can write for each component fundamentally the same equation (2.61) now 
indexed by c. We can also write a single matrix equation, where the block diagonal 
shape of A has been introduced, and A takes the form 


(2.63) 


with 


(2.64) 



[1 < * < Si - 1] 
[* = Si] 


[Sc-i < * < Sc — 1] 
E* = Sc] 


' 0 
Hi 


0 

, H C 


[1 <j<Si- 1] 
\J = Si] 


[Sc-i < j < Sc — 1] 
[j = Sc] 


and X, = Xi(Li), as before. 

The previous comments regarding the number of connected components can be 
related to the dynamics of the individual agent treated in Sec. 2.4. It is clear that, 
in the case of C components, a prerequisite to analyzing the problem is to provide 
an initial condition that specifies the location of the agent. If the agent is placed at 
component c at time t = 0, solving for roo(i) and s oo(*) provides information relevant 
to c only. If, on the other hand, we specify that at time t = 0 the agent can be found 
in component c with a probability H c /H , then we can develop analysis to describe 
the more general case of agents distributed across the graph. 

In this section, we have developed one particular approach for solving (2.30), but 
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other approaches are possible. Among those, one of the most general is to apply 
singular value decomposition to determine a decomposition for the space of solutions 
of A, including the kernel of the matrix, and then apply the uniqueness conditions. 
Ultimately, the application of a particular method is a practical matter that takes 
into consider aspects of the problem that may go beyond the theoretical ones. 

3. Empirical analysis. The theory developed above rests on a set of assump¬ 
tions described in the previous section; namely, the presence of a well defined net¬ 
work, a steady state flow of workers during significant periods of time, and a simple 
approximation to the possible behavior of workers as they leave firms and seek new 
employment. The aim of this section is to explore empirically the model, both by 
corroborating that the assumptions we make are close to the observed behavior of 
the system, and by studying some of the consequences of our model in terms of how 
well it predicts the statistical characteristics of real systems. We find that both our 
assumptions and the results that the model predicts fit the data well. 

We now describe our approach in more detail. The first assumption employed 
is that the edges of the graph can reasonably be considered static. We test this 
by introducing the notion of persistence of the graph, i.e., the property that over 
successive time intervals, many edges produced by agent’s job transitions between 
firms re-occur, indicating that the graph does not change over time in a random way, 
but instead exhibits a static behavior (at least partially). By restricting our model to 
static networks, our approach assumes labor markets are static, and although this is 
not entirely true, it appears that it is sufficiently true to capture the main behavior 
of the system as our analysis indicates 5 . 

The second assumption tested is that the system is close to the steady state. 
To confirm this assumption, we study the histograms of agent in and out flows at 
firms, finding that, by far, the most typical situation is that these flows are virtually 
balanced for each firm, indicating that in fact the system is typically not growing or 
declining, but rather remaining steady on a firm by firm basis. 

In order to characterize the agreement between empirical data and our model, we 
also measure the per-firm values of the separation rate A, and the job finding rate 
£j. These two quantities, which we assume to be related to the firms rather than the 
individual agents, appear to be satisfactory quantities when comparing model and 
data. 

As a way to illustrate the fact that our framework captures some of the relevant 
features of job mobility, we test the consequences of the homogeneous opening rates 
model (vi = v ) against data. In particular, we study the ratio (Li) and {Uj). vs. the 
ratio of A* and (in effect checking eq. (2.57)), and also whether (Lj) and (Uj) are 
consistent with the predicitions of (2.35) and (2.55). We find that indeed the model 
and data agree sufficiently to accept our approach as a plausible way to model job 
mobility. 

We begin our detailed analysis with a description of the data we use, and then 
proceed to present the tests mentioned above. 

3.1. The data. We use a high-resolution dataset and a support dataset where 
the information is more aggregated. The main dataset consists of employer-employee 
matched records at a daily resolution. It is a sample of « 4 x 10 5 workers and 


*The accuracy of this assumption is affected by time scales, as over very long periods of, say, a 
decade or more, one could not expect static behavior. But this is acceptable in our framework, as 
one cannot expect to predict the full economy over such time scales anyway. 
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~ 8 x 10 4 firms provided by the “Institute Mexicano del Seguro Social” (Mexican 
Social Security Institute or IMSS). The workers were sampled from the universe of 
individuals who were registered at IMSS between 2000 and 2008 (all individuals who 
work in the private sector are registered at IMSS). Then, the complete employment 
history of each worker was extracted from the database (that includes any activity 
before 2000). The fraction of workers captured in this dataset is approximately 1% 
of the total workforce of Mexico in the private sector. 

These employer-employee matched records are constructed in the following way. 
For each worker, every time there is a job transition between employment and un¬ 
employment (in either direction), the worker’s record is updated: i) when hired into 
a firm, the record contains the day at which the worker starts employment and a 
unique identifier for the firm (consistent for all workers in the data set), and ii) when 
separated from a firm, the day in which this occurred. Note that the dataset does not 
track firms directly, and thus the only means of tracking them is through individuals 
in the dataset. 

We employ the IMSS dataset for our main analysis due to its high resolution. The 
support dataset consists of employer-employee matched records from the universe of 
employed workers and firms in Finland, constructed from social security records, and 
provided by Statistics Finland. These records consist of annual observations that track 
each employed worker in the economy between 2000 and 2008. Each year contains 
approximately 200,000 firms and 1.5 million individuals. 

3.2. Persistent flows. There is some empirical evidence that, whenever a per¬ 
son leaves firm i and then gets a job at firm j. transitions between i and j (in either 
direction) are likely to be repeated in the future [24]. If indeed such transitions are 
repeated, we consider them to be persistent. We employ our datasets in order to 
measure persistence in both Mexico and Finland. 

Let t and t+1 be the starting and ending times of a given period of job transitions, 
t + 1 to f + 2 a second period, etc. 6 We denote such time intervals with I(f; 1) := 
[t,t + 1] and use /(f) := /(t; 1) for simplicity. To construct a graph of firms and 
transitions based on empirical data, we proceed as follows: when we observe a worker 
transitioning from firm i to j or vice versa, we introduce an undirected edge between 
i and j; we do not consider weights, so once an edge has been created, additional 
i f> j transitions in the same period have no further consequences in the graph 
structure. In addition, our data allows us to observe firms that may not have had any 
incoming or outgoing job transitions, and these firms are encoded as isolated nodes. 
The graph G/p) is constituted by all the edges occurring in the period /(f), the nodes 
to which those edges are incident, and the isolated nodes that display no transitions. 
To measure persistence, the relevant question is: how many edges in period I(t + 1) 
also occurred in the previous period /(f)? To assess this, we define 

(3.1) PE t = E(G m )nE(G I{t+1) ), 

the set of common edges between graphs G/p) and G/p + 1 ), where E{.) is set of edges 
of the argument graph. Then, \PE t \/\Ej( t+1 ' ) \ is the fraction of the edges £'(G/( t+ i)) 
that are persistent. 

This concept of persistence captures repetition of job transitions. However, a 
random job search process can produce repeated job transitions by chance. Therefore, 

6 We concentrate on annual consecutive periods. However, non consecutive periods yield similar 
results. 
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persistence is only meaningful to the extent that it occurs more frequently than what a 
random process would lead to. Furthermore, one should be able to define confidence 
intervals addressing whether the persistence found could emerge as a consequence 
of random fluctuations. A natural random (null) model one could use to compare 
persistence in real vs. random job search is to allow any individual looking for a job 
to apply and potentially fill any of the vacancies offered by the firms of the graph. 
In this model, firms have a defined number of vacancies and of job-seekers (both 
determined from our datasets) and individuals are allowed to apply and potentially 
be hired into any of those jobs (except the ones of its last employer). Below, we 
develop a set of statistical tests to determine confidence intervals for persistence using 
this approach, and apply it to the IMSS data from Mexico. Given that the absence 
of an edge is potentially meaningful because it may signal a genuine lack of affinity 
between firms that never connect, we perform additional confidence testing to take 
this into account. We find that there is a large degree of confidence that persistence is 
indeed present, and that adding tests to account for lack of connections only increases 
the confidence levels. We should briefly mention here that our null model is indeed an 
appropriate test to compare to current economic thinking, which assumes aggregate 
matching processes that ignore firms and their contributions into the heterogeneity of 
the real job transition process. 


3.2.1. Hypothesis testing for persistence. The null models are constructed 
independently for every pair of consecutive time intervals. For one such pair of time 
intervals, we first determine for each firm i and time interval I(t) or I (t+1) the number 
of hires into i coming from other firms ( 77 ^ (t) and r].i(t-\ 1-1)) and job separations (A u ,i(t) 
and A U) j(tl)) that occur. For each of the intervals, for instance I(t), a random job 
transition graph is built by taking for each node the number 77 .* (i) as vacancies 
that need to be filled by i. and A Ul ,(t) as the number of individuals that leave i and 
seek other jobs. With these two number as constraints for every node, the vacancies 
and job-seekers are then randomly matched over the entire set of nodes, forbidding job¬ 
seekers to go back to their previous employer. This approach is basically equivalent 
to a random configuration model (for a review, see [26]). A number M = 300 of 
such random realizations is computed for each interval I(t ), generating an ensemble 
of random graphs. We use this ensemble to obtain the distribution of the statistic 


(3.2) 


4>t 


\PE[\ 

I PEtY 


where PE' and PE are defined via (3.1), with PE' representing the fraction of per¬ 
sistent edges between random graph samples, and PE the fraction of persistent edges 
between the corresponding empirical graphs. There are (^) values of PE' generated 
by our procedure. The statistic ift measures the extent to which the global random 
matching mechanism explains the observed transitions over the ensemble built for 
multiple pairs of years covering an overall span of 8 years. 

As mentioned above, the absence of an edge can contain relevant information 
about the lack of affinity between pairs of nodes. In the economics literature this 
would be thought of as a friction. Therefore, if the global search model explains both 
the observed persistence as well as the persistent lack of labor flows between firms, one 
would require an additional statistic to capture the persistence of lack of connections. 
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kernel densities for levels of p (Mexico, t = 1999) 



rejection zones 
(Mexico, 90% confidence) 



rejection zones 
(Finland, 90% confidence) 
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■ ip and q 


■ f? 

2006 

□ none 



P 


P 


Fig. 1. Measures of persistence for job transitions. Top: Distributions of ip from (3.2) when 
applied in the context of Sec. 3.2.2 for various q. These distributions were generated from a Monte 
Carlo procedure. As the probability of local search q increases, the distribution of ip shifts to the right. 
When ip falls inside the distribution (e.g., bellow the 90 th percentile) it means that the corresponding 
level of q is enough to not reject the null model. This level is approximately q = 0.5 for ip and q = 0.8 
for q. Bottom: Rejection zones. The panels show the levels of q for which the null hypothesis is 
rejected. The black area represents the case in which the null model is rejected for both ip and g. For 
a q in the gray zone, the null model is rejected only for ip. In the white area the null model explains 
the empirical levels of both ip and g. Synthetic distributions were created for values of q E [0,1] 
equally spaced by 0.1. 


We therefore define 
(3.3) 


Qt 


\PFl\ 

\PFt \’ 


where PF denotes the set of persistent frictions (the pairs of firms that are not 
connected in the network). 

Using Monte Carlo simulation, we performed a one-sided test for if) and g. The 
null hypothesis is that, under a global search process, we would expect i/j = 1 and 
0=1. The global search hypothesis was rejected with 99% confidence in both cases. 
For an illustration, the top panel in figure 1 shows the probability distribution of 
i/j (the one on the far left) generated form the Monte Carlo procedure. Clearly, the 
confidence interval of the distribution is far bellow 1 (the mean is ip = 0.001), implying 
that the global search fails to explain the persistent labor flows between firms. 
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3.2.2. A tunable model for the contribution of persistence. The global 
search mechanism fails to explain both the persistent edges and frictions across the 
eight years of data. This is consistent with intuition, given the large space of possible 
matches that can emerge when all job vacancies are accessible to all job-seekers. If in 
reality, as our results suggest, job-seekers use a subset of possible job transitions, the 
matching mechanism of our null model should restrict the job search. Therefore, we 
introduce an additional mechanism: with probability q a job seeker searches through 
the graph and with probability 1 — q searches globally. Clearly, when q —> 1 we obtain 
the mechanism proposed in section 2 and when q —> 0, the search is global over all 
firms. 

With the local search mechanism in place, we need an additional assumption for 
the null model. Consider the null networks G/( t ) and G/( t+ i) with corresponding sets 
of edges E(Gi( t )) an d E(Gi( t + 1 ))- Since we are in a steady state, it is reasonable 
to assume that any edge in G/( t ) can also exist in Gj^+i) (and vice versa), even if 
it is not observed in the data. Then, when a worker searches locally under the null 
model, it does so by using the network G^, such that Ef = E^Gj^) U .E(G/( t+1 )). 
This assumption captures the time-invariant aspect of the steady state, and allows g 
to take values higher than 1. 

We compute the null model for different levels of q , so we can answer the question: 
what is the minimum q needed to generate at least the level of persistence observed 
in empirical data. First, we randomize the matches between job seekers and vacan¬ 
cies, generating new datasets. Then, we construct null networks from these datasets. 
Next, we compute (3.2) and (3.3) for each pair of null networks to generate their 
distributions. Finally, we use these distributions to perform a one-sided test with for 
each statistic. If the statistic falls beyond the 90 th percentile, the null model does 
not explain the persistence of edges or frictions. When we find a q such that the 
statistic is below the 90 th percentile we cannot reject the null model. The smallest q 
under which we cannot reject the null model for neither ^ nor g is an indicator of at 
least how frequently people should search locally in a model in order to explain the 
structure of the empirical data. 

Figure 1 shows the results from this analysis. In general, a higher q is needed to 
explain both persistent edges and frictions than just edges. An approximate estimate 
suggests, in order to explain empirical persistence, a job-seeker needs to search on the 
network at least 75% of the time. This result is consistent across both datasets and 
strongly suggests that the network approach is much more empirically relevant than 
the global search one, providing a solid motivation for the model developed in this 
paper. 

3.3. Model validation. In this article we concentrate on the steady state be¬ 
havior of the system. In order to validate this choice, we first study the distribution 
of (A ij) — (1^1) from the data. From this point on, we concentrate on the IMSS 
data since its daily resolution allows us to identify the duration of employment and 
unemployment spells of each individual, which is crucial to our analysis. Then, if the 
system is close to the steady state, the distribution of (A i ti ) — (|r/ i |) should be con¬ 
centrated around 0. Figure 2 corresponds to the distribution of average agent daily 
flows over the period of 1 year into and out of a firm, with a pronounced peak around 
zero, which corresponds to our intuition. The averages have been taken by using the 
periods of observations of the workers associated with firms. 

Next, we determine the rates of separation and hiring. To estimate the values of 
separation rates, we proceed by tracking all employees of a firm that are observed to 
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Fig. 2. Distribution of the difference between flow into and out of a firm. There are 49146 
firms in this distribution. 


enter and exit that firm. Separation of an agent from a job is characterized by (2.51). 
In order to estimate A, for a firm, we perform a maximum likelihood estimation. For a 
sample of agents of size sf \ the log of the joint distribution of employment durations 
(4^, • • ■, } in a given firm is 


(3.4) log 
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and the maximum likelihood (ML) estimator is 


(3.5) 
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the value of Aj that maximizes (3.4). The effective rate of hiring can be estimated 
in the same way, with the ML estimator given by 

sl u) 1 

S' u) t {u) ~ 

1=1 o v 1 ' 

where there are s| u ’ > unemployed individuals with unemployment times { t ^,..., }. 

The measurements of Ai and ^ can be studied via their distributions, as shown in 
Figs. 3(a) and (b). For the distribution of A,, the sample of individuals was restricted 
to those who began and ended their tenure of employment within the time frame of 
the data; similar considerations were applied to the distribution of £j, restricting the 
sample to individuals that become unemployed and subsequently found employment 
during the window of observation. The distributions of A i and £» both exhibit decay¬ 
ing heavy-tails, indicating a wide variation in the rates of agent separation or hiring. 
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Fig. 3. (a) Distribution of values of A; over firms. In this case, Sh ■' = 83450. (b) Distribution 
of ii over firms, where SC) = 83450. 


3.4. An illustration: homogeneous opening rates. The analysis presented 
above supports a picture of considerable heterogeneity in real economic systems. 
Therefore, a full treatment of the data is likely to require detailed application of our 
model, accompanied by robust statistical analysis that is yet to be fully developed. 

However, for the purposes of illustration in this article, it is useful to perform 
some basic comparisons between the data and some version of our model. Given the 
absence of information for {vi} and {hi}, it seems reasonable to compare a model that 
simplifies at least one of these parameters while assuming the other continues to be 
heterogeneous. This provides some flexibility so that the model is able to cope with at 
least some level of complexity from the real data. Therefore, we chose to compare the 
data with the model characterized by homogeneous rates Vi = v for firms to accept 
applications (opening rates). We find that, even for this simple case, there is evidence 
to support the plausibility of our approach. 

As a first test, we explore the ratio (2.57), which is convenient because it only 
contains directly measured parameters. Note that here, since all the parameters 
emerge from measurement, we are not concerned with using the superindex (v) to 
symbolize the homogeneity in v. Using the dataset from Mexico, we estimated (Li) 
and (Ui) for 2008. In order to assure independence across the errors, we estimate G 
and Xi from observations of employment and unemployment spells that concluded at 
least three years prior to 2008. We excluded firms for which (Ui) = 0 and estimate a 
and $ defined by 


(3.7) 




Due to the large variance heterogeneity in the data, we make use of the random 
re-sample consensus algorithm (RANSAC) [25] in order to estimate a and /3. The 
algorithm randomly samples the data in order to discriminate the outliers and fit (3.7) 
via OLS to the in-hers iteratively. Since the RANSAC algorithm is non-deterministic, 
the estimators vary from run to run. In order to illustrate the coherence of the model, 
we performed 10,000 estimations using this procedure and analyzed the distribution 
of a and /3. 

Figure 4 shows the histogram of the estimator j3. The average /3 is 0.98, while the 
most frequent is 1.0031. The average estimator a of the intercept is 1.1425 ± 0.0007. 
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Fig. 4. Distribution of (3 obtained form 10,000 estimations of the RANSAC algorithm, using 
OLS as the underlying model. 


These results are quite close to the theoretical prediction of (2.57). 


To perform a second test, we consider whether (2.35) and (2.55) may be con¬ 
sistent with the data. For this, we concentrate on two conditional probabilities: i) 
Pr(L i |fcj/Ai) for the number of employed individuals at a firm, given the firm is 
characaterized by the ratio ki/Xi , and ii) Pr(Z7j|fci/£i) for the number of unemployed 
individuals at a firm, given the firm is characterized by the ratio /c In particular, 
we want to learn whether the basic predictions contained in (2.35) and (2.55) are 
satisfied, i.e., that (Li) ~ ki/Xi and (U/) ~ 


In Figs. 5 and 6, we present contour and 3-dimensional plots of log 10 


Pr(L i |fcj/A i )/Pr(T*|fc i /A i ) 


andlog 10 Pr(Ui\ki/ii)/Pi(U*\ki/ii) respectively. Here, Pr(L*|fc i /A l ) and Pr(f7f |fcj/£i) 
correspond to the probabilities associated with the conditional modes of Li and Ui. 
The reason to plot the ratios just defined is that (2.35) and (2.55) are concerned with 
averages rather than distributions, and we therefore must devise a way to relate the 
empirical analysis with our predictions. To interpret the plots, we introduce a line of 
slope 1 (linear relation) in Figs. 5(a) and 6(a). Such lines, by definition, scale as ki/Xi 


and ki/^i- The relevant feature that the plots show is that these lines runs parallel to 
the contour for the largest value of Pr(Lj|fcj/Aj) and Pi(Ui\ki/£i), or in other words, 
L* ~ ki/Xi and U* ~ Figures 5(b) and 6(b), showing in 3-dimensions the sur¬ 

face of the logarithm of the distribution ratios, reinforce our interpretation: in these 
plots the location of the maxima for the surfaces is cut by the planes that have been 
constructed to coincide with the linear maps of Figs. 5(a) and 6(a). These relations 


hold for small and intermediate values of ki/Xi and ki/^i, but eventually fail for the 
largest values of both ratios, probably due to poorer sampling at such large values of 
h/Xi and &»/&. 


The results presented in this section focus on a very simple comparison and, 
notwithstanding the partial differences we encounter between our equations and the 
measurements, support the plausibility of our model as a way to explain job mobility. 
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Fig. 5. Behavior of log 10 ^Pr(Li|fci/A;)/Pr(L*|fci/Ai)J from 49854 firms. (a) Contour plot 

together with a linear map aki/Xi with a a numerical constant. This line is close to parallel to the 
contour line of the maximum of the ratio of distributions, which indicates that the linear relationship 
L* f>j aki/Xi is plausible for a range of values of ki/Xi. (b) The corresponding 3-dimensional plot of 
the ratio of probabilities. The plane coincides with the line drawn in (a), defined by the parametric 
representation (ki/Xi, ki/Xi, (3) where (3 is a free parameter. 


Fig. 6. Behavior of log 10 ^Pr(Ui\ki/ £i)/Pr(U* from 49854 firms. The interpretation of 

this plot is analogous to that of Fig. 5. (a) Contour plot together with a linear map aki/^i with a 
a numerical constant. This line is close to parallel to the contour line of the maximum of the ratio 
of distributions, which indicates that the linear relationship U* ~ aki/^i is plausible for a range of 
values of ki/^i. (b) The corresponding 3-dimensional plot of the ratio of probabilities. The plane 
coincides with the line drawn in (a), defined by the parametric representation (ki/£i, ki/^i, (3) where 
(3 is a free parameter. 




4. Conclusion. Detailed high resolution data on employment at large scale is 
becoming rapidly available, and this provides an opportunity to revisit the way in 
which job mobility and labor flows are studied. In particular, it makes it possible 
to move away from aggregate models that, while having been very useful, have been 
unable to address some important outstanding problems, such as the construction 
of realistic shock scenarios, which are necessary if one is to attempt to design real¬ 
time forecasting models of high resolution employment flow. This task, which has 
not yet been possible, may be within our reach for the first time, with considerable 
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potential value for economic policy design that is well grounded empirically and for 
which impacts can be forecast in great detail. 

In this manuscript, we have introduced a new basic framework that takes into 
account the role of firms in employment, and makes extensive use of real data. By 
performing a number of tests, we have been able to see that indeed the model behaves 
in similar ways to the data. Furthermore, we have provided the basic ingredients for 
algorithms to calculate the average numbers of employed and unemployed agents 
associated with a firm. The notion of firm specific unemployment, which we have 
introduced here, is a new concept that allows us to keep track of the information 
that is implicitly contained in the fact that an agent has held a job in a certain firm, 
indicating that agent’s affinity to some firms but not others of the economy. 

An interesting consequence arising from (2.57) is that in the steady state the 
numbers of employed and unemployed agents of a firm are not independent of each 
other and therefore, firms that have large numbers of employees could contribute large 
numbers of unemployed people if the ratio between the average times of employment 
and post-employment search is low. This is a question worth further exploration. 

Finally, our introduction of a framework based on random walks on graphs to 
study job mobility can be a useful development. Random walks on graphs have a 
considerably long history, and a great deal is known about them (see, e.g. [10] for a 
review). Being able to deploy such a toolkit on questions regarding employment may 
lead to new results with potential academic and practical impact. 
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